Calibrated Lazy Associative Classification
نویسندگان
چکیده
Classi cation is an important problem in data mining. Given an example x and a class c, a classi er usually works by estimating the probability of x being member of c (i.e., membership probability). Well calibrated classi ers are those able to provide accurate estimates of class membership probabilities, that is, the estimated probability p̂(c|x) is close to p(c|p̂(c|x)), which is the true, empirical probability of x being member of c given that the probability estimated by the classi er is p̂(c|x). Calibration is not a necessary property for producing accurate classi ers, and thus, most of the research has focused on direct accuracy maximization strategies (i.e., maximum margin) rather than on calibration. However, non-calibrated classi ers are problematic in applications where the reliability associated with a prediction must be taken into account (i.e., cost-sensitive classi cation, cautious classi cation etc.). In these applications, a sensible use of the classi er must be based on the reliability of its predictions, and thus, the classi er must be well calibrated. In this paper we show that lazy associative classi ers (LAC) are accurate, and well calibrated using a well known, sound, entropy-minimization method. We explore important applications where such characteristics (i.e., accuracy and calibration) are relevant, and we demonstrate empirically that LAC drastically outperforms other classi ers, such as SVMs, Naive Bayes, and Decision Trees (even after these classi ers are calibrated by speci c methods). Additional highlights of LAC include the ability to incorporate reliable predictions for improving training, and the ability to refrain from doubtful predictions.
منابع مشابه
Lazy Associative Graph Classification
In this paper, we introduce a modification of the lazy associative classification which addresses the graph classification problem. To deal with intersections of large graphs, graph intersections are approximated with all common subgraphs up to a fixed size similarly to what is done with graphlet kernels. We illustrate the algorithm with a toy example and describe our experiments with a predict...
متن کاملEager, Lazy and Hybrid Algorithms for Multi-Criteria Associative Classification
Classification aims to map a data instance to its appropriate class (or label). In associative classification the mapping is done through an association rule with the consequent restricted to the class attribute. Eager associative classification algorithms build a single rule set during the training phase, and this rule set is used to classify all test instances. Lazy algorithms, however, do no...
متن کاملGraphlet-based lazy associative graph classification
The paper addresses the graph classification problem and introduces a modification of the lazy associative classification method to efficiently handle intersections of graphs. Graph intersections are approximated with all common subgraphs up to a fixed size similarly to what is done with graphlet kernels. We explain the idea of the algorithm with a toy example and describe our experiments with ...
متن کاملRule Pruning in Associative Classification Mining
Classification and association rule discovery are important data mining tasks. Using association rule discovery to construct classification systems, also known as associative classification, is a promising approach. In this paper, we survey different rule pruning methods used by associative classification techniques. Furthermore, we compare the effect of three pruning methods (database coverage...
متن کاملPruning Techniques in Associative Classification: Survey and Comparison
Association rule discovery and classification are common data mining tasks. Integrating association rule and classification also known as associative classification is a promising approach that derives classifiers highly competitive with regards to accuracy to that of traditional classification approaches such as rule induction and decision trees. However, the size of the classifiers generated ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008